C++: Don't use definitionByReference for data flow#1777
Conversation
| // f(&variable) | ||
| va = argument.(AddressOfExpr).getOperand() | ||
| or | ||
| // f(&array[0]) |
There was a problem hiding this comment.
should this be generalized to handle multi-dimensional arrays?
There was a problem hiding this comment.
Is it normal to pass a multi-dimensional array as f(&my3dArray[0][0][0])? I'd expect people to just write f(my3dArray). What should we do about f(&my3dArray[0][0]) (two zeros)?
Can you find other common cases that aren't handled right here? I wouldn't be surprised if there are more problems.
There was a problem hiding this comment.
I don't think it would be normal to do that, but I wouldn't be surprised if something like f(&my3dArray[x][y][z]) is significantly more common.
This implicitly covers FieldAccess as well - I assume that's intended?
There was a problem hiding this comment.
Looking at this again, I'm not at all sure if FieldAccess should be included here.
There was a problem hiding this comment.
Thank you for the reminder. I didn't see the FieldAccess comment in your last review.
The referenceArgument predicate defines the DefinitionByReference class, which is used for two things. The first thing is that it's used in the TBlockVar constructor, but that constructor requires not v instanceof Field and so excludes fields. The second thing is that it determines the charpred of DefinitionByReferenceNode. The DefinitionByReferenceNodes that don't assign to fields will have a FlowVar associated with them, and there will be flow out of them thanks to this rule. Those that assign to fields will not have any flow out of them by default (because they have no FlowVar), but they will still be available in case users of the library want to add custom data flow edges out of them with isAdditionalFlowStep.
In summary, I think FieldAccess should be included here because it makes the library more flexible. You might even argue that arguments that have no VariableAccess should be included too, but then we're getting into corner cases that are probably best handled with IR-based data flow.
geoffw0
left a comment
There was a problem hiding this comment.
LGTM.
Seems to slow things down a little (LossyFunctionResultCast.ql, mysql, 396s -> 414s) as most of these changes do. Once we've made all the improvements we want to, it may be worth spending a bit of time profiling what we end up with.
|
|
||
| - The predicate `Variable.getAnAssignedValue()` now reports assignments to fields resulting from aggregate initialization (` = {...}`). | ||
| - The predicate `TypeMention.toString()` has been simplified to always return the string "`type mention`". This may improve performance when using `Element.toString()` or its descendants. | ||
| - The `DataFlow::DefinitionByReferenceNode` class now considers `f(x)` to be a definition of `x` when `x` is a variable of pointer type. It no longer considers deep paths such as `f(&x.myField)` to be definitions of `x`. These changes are in line with the user expectations we've observed. |
There was a problem hiding this comment.
Makes sense to me, even if there's a hint of heuristic about this.
|
Samate tests job: https://jenkins.internal.semmle.com/job/Security/job/SAMATE/job/SAMATE-cpp/461/ I'm a bit paranoid about the samate job failing because it's only run once per week by default (which is usually plenty), but we're merging a lot of data flow changes this week! |
| argumentType = argument.getFullyConverted().getType().stripTopLevelSpecifiers() | ||
| | | ||
| argumentType instanceof ReferenceType and | ||
| not argumentType.(ReferenceType).getBaseType().isConst() and |
There was a problem hiding this comment.
What about rvalue references to non-const types?
There was a problem hiding this comment.
I believe those are ruled out with this predicate. The argument is restricted to be a VariableAccess, and a VariableAccess is never an rvalue unless you send it through std::move (or cast it yourself, but that should be very rare).
|
The Samate test run succeeded. I'm happy to merge this when we're ready to have it in master. |
|
Change note needs to move to 1.23. |
6facf14 to
604996d
Compare
|
I've updated this PR to put the change note in the right place. @rdmarsh2, it's true that these changes are syntactic in nature and incomplete, but it sounds like you agree that the most common cases are covered. How about we leave the more semantically accurate modelling to the IR-based data flow library? |
The data flow library conflates pointers and objects enough for the `definitionByReference` predicate to be too strict in some cases. It was too permissive in other cases that are now (or will be) handled better by field flow. See also the change note entry.
|
Actually, I have another commit coming to this branch to fix the test failure that has started appearing after I rebased. I haven't figured out exactly why the test fails. I think it's a bug that's revealed but not caused by this PR. |
Data flow probably never worked when a variable declared in a `ConditionDeclExpr` was modeled with `BlockVar`. That combination did not come up in testing before the last commit.
a2976de to
067c55a
Compare
|
The tests broke after rebasing this branch on
|
geoffw0
left a comment
There was a problem hiding this comment.
First commit still LGTM.
Second commit LGTM.
|
I'm still unsure about the |
| VariableAccess va; | ||
|
|
||
| DefinitionByReference() { definitionByReference(va, definedExpr) } | ||
| DefinitionByReference() { referenceArgument(va, definedExpr) } |
There was a problem hiding this comment.
It might look strange that this is not restricted in this charpred. That's because the full extent of this class includes the charpred of the superclass, which relates this to definedExpr, and va is functionally determined by definedExpr.
There was a problem hiding this comment.
I've added a commit that turns the above remark into a code comment.
The data flow library conflates pointers and objects enough for the
definitionByReferencepredicate to be too strict in some cases. It was too permissive in other cases that are now (or will be) handled better by field flow.See also the change note entry.